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ABSTRACT 

Recent works in structured P2P systems exploit DHT to support publish/sub scribe( Pub/Sub) protocols. Some of 
these approaches require the existence of a so-called rendezvous node where subscriptions meet events, thus 
easily creating bottlenecks. By contrast, unstructured P2P systems needn 't maintain current topologies for the 
networks, they are robust. We presents an ontology -based Pub/Sub event routing mechanism, called UP2S2, for 
modeling and implementing the architecture of large-scale unstructured P2P Pub/Sub System. According to 
subscription deviation, UP2S2 is divided into multiple subnets. There is a subscription probability tree in each 
subscription subnet. Events are forwarded along the most likely subscription nodes. We design and implement the 
algorithms for UP2S2 to construct and reconstruct subscription routing tables, and derive conclusions from the 
simulation experiments. The results show that UP2S2 routes the events more quickly and accurately. 
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I. INTRODUCTION 

Pub/Sub is an asynchronous communication paradigm that supports many-to-many interactions between 
a set of clients. Pub/Sub communication can be anonymous, where participants are decoupled from space, flow, 
and time. A Pub/Sub system contains three kinds of clients: a publisher who publishes events, a subscriber who 
subscribes his interests to the system, and an event broker network to match and deliver the events to the 
corresponding subscribers. When an event is published, a Pub/Sub system doesn’t specify a specific subscriber. 
After each agent node receives an event, it decides which nodes should be propagated in the next step. Therefore, 
routing of Pub/Sub system is also called content-based routing(CBR). Routing algorithm resolves how to find an 
appropriate path in event broker networks, and how to efficiently and reliably route event to the relevant 
subscribers at a low cost. Both accuracy of event routing and network efficiency are the most important design 
goals, which determines the size and scalability of the network. 

P2P networking is a distributed application architecture that subnets tasks or work-loads between peers. We can 
classify networks as unstructured or structured. Structured P2P systems impose a specific linkage structure 
between nodes. In contrast, in unstructured P2P systems, peers are not linked according to a predefined 
deterministic scheme. Instead, links are created either randomly, or are probabilistically based on some proximity 
metric between nodes. Because of its advantages, such as rapid information dissemination, and content -based 
routing, unstructured P2P network has been used to construct the Pub/Sub system in order to improve the ability to 
improve the efficiency for event routing. 

The rest of the paper is organized as follows. In Section 2, we review some related works. Our main methods 
including the algorithms for constructing and reconstructing subscription routing tables in UP2S2 are presented in 
Section 3. Both validations and evaluations are given in Section 4. Finally, the paper is concluded in Section 5. 

II. RELATED WORK 

Enabling the Pub/Sub services in P2P systems has thus become an interesting topic in recent years. The 
structured P2P system has its advantage in searching efficiency. To provide subject-based Pub/Sub services, data 
and queries with a certain topic can be easily mapped and routed to the same node using DHT. Chaabane proposed 
an approach based on a community-wide ontology in order to allow publishers and subscribers to use a common 
semantic space to characterize production and consumption of resources and services. Based on this ontology, 
these ontology-based community topics are used to compute keys for routing events in a DHT[1]. Setty presented 
PolderCast, a P2P architecture for topic -based Pub/Sub which aims to achieve relay-free, fast and robust 
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dissemination over a scalable overlay with a minimal maintenance cost[2]. Pandey presented the architecture of 
distributed mobile brokers which are dynamically reconfigurable in the form of structured P2P overlay and act as 
rendezvous points for matching publications and subscriptions [3]. Pellegrino introduced two versions of a 
content-based Pub/Sub matching algorithm for RDF described events, working on an adapted version of the CAN 
structured P2P network designed to both store and disseminate RDF events [4]. Einziger introduced Postman, a 
Pub/Sub architecture tailored for self- sustained service independent P2P networks. Postman is designed to 
provide its users with a self-organizing, scalable, efficient and churn resilient Pub/Sub service[5]. Detti discussed 
the benefits of building mobile Ad-hoc networks Pub/Sub system exploiting content centric networking 
technology, rather than TCP/IP, and presented different design approaches, and described a topic -based Pub/Sub 
content centric networking system[6]. Tryfonopoulos studied the problem of distributed resource sharing in p2p 
networks, focused on the problem of information filtering, used an extension of the DHT Chord to organize the 
nodes and store user subscriptions, and utilized efficient publication protocols that keep the network traffic and 
latency low at filtering time [7]. 

Compared with the structured P2P-based Pub/Sub system, the unstructured Pub/Sub services can be provided on 
deliberately designed overlay topologies. In unstructured P2P systems, subscribers and publishers rely on 
distributing the queries and data messages throughout the network to make a successful subscription. Rahimian 
introduced Vitis that exploits two ostensibly opposite mechanisms: unstructured clustering of similar peers and 
structured rendezvous routing. A gossiping technique was embedded a navigable small-world network, which 
efficiently establishes connectivity among clusters of nodes that exhibit similar subscriptions [8]. Chacko 
proposed a CoQUOS approach, which supports continuous queries in unstructured P2P networks. In order to 
solve this problem of flexibility and complexity, proposing an approach of CoQUOS with consistency 
maintenance [9]. Papadakis proposed IT A, an algorithm which creates a random overlay of randomly connected 
neighborhoods providing topology awareness to P2P systems, while at the same time has no negative effect on the 
self-* properties or the operation of the other P2P algorithms [10]. Klusch presented the mobile system, MyMedia, 
which features a high-performance semantic P2P search and a dynamic adaptive live streaming of annotated 
MPEG-DASH videos from mobile to mobile devices over HTTP in wireless networks with an unstructured and 
semantic P2P overlay[ll]. Baraglia proposed a general architecture of a system whose aim is to exploit the 
collaborative exchange of information between peers in order to build a system able to gather similar users and 
spread useful suggestions among them, and presented mechanisms for building communities both in a simple way 
and in a more complex way [12]. Margariti considered flooding, a fundamental mechanism for network discovery 
and query routing, in unstructured P2P networks, and analyzed the behavior of flooding related to duplicate 
messages and provide simple bounds and approximate models to assess the associated overheads[13]. Ferretti 
analyzed the performance of an unstructured P2P overlay network that exploits a very simple dissemination 
strategy to build P2P Pub/Sub systems. A mathematical analysis is provided to estimate the number of nodes 
receiving the event[14]. Leng introduced both replica maintenance and update mechanisms for the BubbleStorm 
P2P overlay and related rendezvous search systems. A complete solution covering all identified use cases included 
a maintainer-based mechanism for data managed by a single node and a collective mechanism for data that shall 
be persistent beyond any particular node’s session time[15]. 

Recent works in structured P2P systems exploit DHT to support CBR. Some of these approaches require the 
existence of a so-called rendezvous node where subscriptions meet events, thus easily creating bottlenecks. By 
contrast, unstructured P2P systems needn't maintain current topologies for the networks, they are robust. However, 
the structured P2P systems are vulnerable to networks storm and data overlapping. In this paper, we propose a new 
unstructured P2P system, suitable for large-scale Pub/Sub. The design guidelines are as follows. 

1) We propose the concept of subscription ontology. Based on the similarity of the subnet, the unstructured P2P 
network is divided into subnets. 

2) Based on the subscription deviation, the routing strategies make the choice for event flooding route. The 
strategies forward events to the most likely directions. 

III. OURMETHODS 

3.1 Subscription ontology 

In information science, ontology is a formal naming and definition of the types, properties, and 
interrelationships of the entities that really or fundamentally exist for a particulardomain of discourse. In an 
ontology tree, each node represents an independent concept, each edge represents a directed edge from a parent 
node to a child node, and child nodes represent sub-concepts of the correspondingparent nodes. In UP2S2, an 
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event is expressed as an ontology tree, a subscriptionis actually a tree model based on ontology tree syntax. A tree 
pattern defines the shape of a tree as well as the constraints on certain nodes and edges. 

Definition 1 subscription probability tree. A subscription probability tree represents a tree form distribution of 
nodesubscription probability.lt is a mathematical structure which is used to reveal the random output of 
subscription ontology with hierarchical structure. 

By using subscription probability tree, we can effectively establish the heuristic index information. This class of 
index information has a faster synchronization speed.lt’ s completely decentralizedfor unstructured P2P networks 
topology, and can effectively prevent broadcast storms and information overlap. 

Definition 2subscription probability. If there is a directed path between node X to node Y, the subscription 
probability of node X to node Y, denoted as SP(X,Y), is the product of probabilities of all the directed edges 
between the nodes. 



SP indicates the subscriber node X’s degree of interest in the events that are published by the publisher node Y. 

For a probability tree for node A, we assume SP(A,B)=0.8, SP(A,C)=0.2, SP(B,D)=0.9, SP(B,E)=0.1, 
SP(C,F)=SP(C,D)=0.5. If there is a path from the node A to the E, and the path passes through directed edges (A, 
B) and (B, E), then SP(A,E)=SP(A,B)*SP(B,E)=0. 1x0.8=0.08. If there is no path from nodes G to B, then SP(G, 
B)=0. 



Definition 3 subscription deviation. Subscription deviation is used to measure the degree of deviation between two 
subscription probability trees. Given P and Q to be two subscription ontologies, the subscription deviation 
between P and Q is defined as follows. 



N 



SD(P,Q] = ]T 



i = L SPT<jQ r fl-*b 



I SPT (P., Q 

\SPT ccji 




Where N is a union set ofdirected edges of the two subscription ontology probability trees corresponding to P and 
Q, SPT(P,i) and SPT(Q,j) are respectively the probability values of the corresponding edges in the two 
subscriptionprobability trees, we assume SPT(P,i)=a, and SPT(Q,j)=b. 

3.2 Subscription subnet 

We assume that all nodes in UP2S2 use the same type of subscription ontology SOI. According to 
subscription ontology, each node establishes a subscription routing table which includes subscription probability 
values. When the node S publishes subscription condition, the node S hasn’t the subscription probability 
information of the other nodes in the system. UP2S2 uses the broad flooding search algorithm which is based on 
the needs of the subscribers. If a node A receives a query message and a match is found against its subscription 
routing table by broad flooding search, the node A responds with a query hit message, supplying its subscription 
routing table information for the subscription node S to generate a subscription probability tree. The subscription 
probability tree for the node S is constructed from the subscription probability values carried by query hit 
messages. In UP2S2, the nodes respond to the node S with the query hit messages. At the end of broad flooding 
(BF) algorithm, the node S will receive the query hit messages sent by a group of nodes PG(S)={ A ls A 2 , . . . ,A n } 
with Vl < i < n, SD(S, A[).The subscription subnet algorithm extracts the subscription probability values from 
the query hit messages to construct a subscription probability tree of node S. 

The key to subscription routing is to establish the corresponding routing tables in the corresponding subscription 
subnets. The algorithm for UP2S2 to construct subscription routing tables is describedas follows. 

/* Formal parameters: the set of subscriptionsubnet nodes: spn, current subscription node: cn, subscription 
deviation threshold value t */ 

void procedure sub_routing_table_constructing(SPNs spn, CN cn, SDT t) { 

/*data type of subscription routing table SRT, the current subscription routing table pointer psrt */ 
float u; 

struct SRT *psrt; 

psrt=malloc(sizeof(structSRT)) ; 

for (;¥xE spn, spn — Jjx}; spn = 0 } do { 
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u=SD(cn,x); 

if (u<t &&routing_table(x)£psrt->routing_table) 

/*Createa new subscription item routing_table(x)in the subscription table of node cn */ 
psrt->routing_table=psrt->routing_table+routing_table(x) ; } 

} 

When a node exits or is offline due to external reasons, UP2S2 take the following strategies. The subscription 
node associated with the offline node recalculates subscription probability, and updates the subscription 
probability tree of the publish nodes.When the next constructing subscription table cycle comes, the subscription 
table construction algorithmwill add new subscription nodes, which are associated with the node, into a new 
subscription table. The algorithm for UP2S2 to reconstruct subscription routing tables is describedas follows. 

/* Formal parameters: the set of old subscriptionsubnet nodes: old_spn, the set of new subscriptionsubnet nodes: 
new_spn, current subscription node: cn */ 

void procedure sub_routing_table_reconstructing(SPNs old_spn, SPNs new_spn, CN cn) { 

SPNsxor_spn; 

xor_spn=old_spn©new_spn; 

/* for each node x, x has been offline node already*/ 
for (; Vx,x E xor_spn, xor_spn — [xl; xor_spn = 0)do { 

/* for each node y, y has the subscription relationship with x*/ 
for Vy E old_spn J old_spn — [y]\ ofd_spn = 0)do { 

u=SD(x,y)=oc; 

if (routing_table(x) Erouting_table(cn)) ; 

/*Deletethe subscription item routing_table(x)from the subscription table of node cn*/ 
routing_table(cn)=routing_table(cn)-routing_table(x); 

/* Update the subscription probability tree of the subscription subnet*/ 
update_SPT();} 

} 



3.3 Routing strategy 

Controllingnode set(CNS) is defined as a node set in UP2S2, which makes the other nodes in UP2S2 
adjacent to a certain node in the node set. A node in CNSis called controllingnode(CN). Otherwise, the node is 
called non-controllingnode(non-CN). CN is an important concept in subscription subnet-based routing.lt has been 
widely used in the layout problem of Pub/Sub nodes. In CBR, each node tends to publish events andto propagate 
subscriptions. In order to reduce traffic congestion caused by the excessive routing and forwardingmessages, 
typically, nodes are divided into subnets. 

UP2S2 uses the routing strategy based on the metric of subscription deviation.The routing strategy isn’t consistent 
across the whole network, but is based on the differential measurement of subscription demands. The core idea of 
UP2S2 is as follows. According tosubscription deviation, the sub_routing_table_constructingalgorithm divides the 
whole unstructured P2P network into multiple sub scrip tionsubnets. There is a subscription probability tree in each 
subscription subnet. Events are forwarded along the most likely subscription nodes. As for events, UP2S2 strategy 
is made of two constituents. First, events are routed through their own subnets. Second, An event disseminated 
along CNs. 

In this section, we propose aCBR method based on CNS to describe the model of CBR managementlayer 
integrated Pub/Submechanism and CNS. The method is described as follows. 

First, we assume that UP2S2 is divided into three subnetswhich are responsible for publishing/subscription by 
CNs a, b and c respectively. In each subscription subnet, a subscription probability tree is established by using a 
node as the root node. It’s shown in Fig.l. In the legend, we use rectangle and circle to represent CN and non-CN 
respectively. 




Fig.l Construction of subscription probability tree 
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Second, UP2S2 calculates the subscription deviation values between non-CNs and its corresponding CN. And 
then, the sub_routing_table_constructingalgorithm creates subscription routing tables for these non-CNs. The 
propagations of subscriptions in subnets are shown as in Fig. 2. In the Figure, the dashed arrows represent the 
direction of propagating subscriptions. 
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Fig.2:The propagations of subscriptions in subnets 



Finally, each subnetestablishesanunstructured P2P network, and CNsare connected to form UP2S2. So, the UP2S2 
is divided into two levels.In such networks, A CN adjacent to a non-CNis responsible forpublishing event of the 
non-CNs. A CN publishes events to otherCNs. Event and subscription routing strategies are shown as in Fig. 3. In 
the Figure, the solid arrowsindicate publishing events. 




Fig.3: Event and subscription routing strategy 



3.4 The cost of obtaining subscriptions 

To obtain user’s subscription is a prerequisite for UP2S2 to establishsubscription routing table. Due to 
diversity and the differences among subscriptions, the cost of obtaining subscriptionsbecomes an important 
problem that can’t be ignored.The user’s subscriptions aren’t static over time. Therefore, this requires a more 
rational approach to obtaining subscriptions.Without affecting the time and space complexity, UP2S2uses a 
method of mining association rules in relation of quantitative attribute. By classifying user’s subscriptions, 
subscribe sets which have different functional attributes are constructed. And then, combined with the extensible 
transformation principle, UP2S2 can deduce the new if-then rules.In this process, the costs of UP2S2 are mainly 
focused on the collection and processing of subscriptions. It divides the user’s subscription data into several 
property sets, and determines the conditional attribute sets and the conclusion attribute sets.To compute support 
and confidence is UP2S2’s additional overhead. 

IV. VALIDATION AND EVALUTION 

According to the needs of P2P network operations, we design simulation experiments and comparative 
analysis methods, which are compared with the BF routing strategy. Experimental contents include three aspects: 

1) According to the subscription tables, UP2S2 routes the events more quickly and accurately; 

2) UP2S2 can save network bandwidth and reduce unnecessary network overhead; 

3) UP2S2has the characteristics ofquick-response and high-efficiency constructingthe subscription tables. 

We validate UP2S2 by simulation. The simulator uses PeerSim, a discrete event simulator for large-scale 
distributed systems. In the following experiments, assume that the P2P network size is 10000 nodes, there are 100 
subscription subnets, and each subnet has 100 nodes. To simulate a realistic network environment, the uniform 
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distribution of the publishingevent nodes was discussed. Assume that the numbers of the event nodes are 800, the 
event subscription rate ESR=1%, and the event and subscriptionmessage time to live value equal to 10. 

4.1 Accuracy of event routing 

The purpose of this experiment is to verify the ability ofUP2S2’squick event routing.This capability 
includes the success rate at routingevent, the event routing time in different subscription subnets, and the hit rate of 
eventmatching. Figure 4 shows the results ofthe success rate at routingevent vs. periods of time(POT). 




Fig.4: The success rate at routingevent vs. POT 



As shown in Figure 4, when POT=4, the BF event routing accuracy reaches its peak.Because the network is 
flooded with a large number of query information, a large number of nodes dieenergy resources when POT=l. The 
accuracy of BFevent routing decreases by nearly 20%. Because UP2S2 is divided into subnets according to user's 
subscriptions, when eventsare routed in a subnet, accuracy of event routing is higher, and is up to 75%. With the 
decrease of POT, the accuracy monotonically increases. This indicates that compared with the BF algorithm, the 
UP2S2 event routing accuracy is higher. 



4.2 Cost of redundant events of network 

The purpose of this experiment is to verify that UP2S2 can save network bandwidth and reduce 
unnecessary network overhead. Figure 5 shows the results ofthe redundant eventsvs. POT. 
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Fig. 5 : The redundant events vs. FOX 



As shown in Figure 5,UP2S2 has an increaseoflOO on redundant events in every eventflooding. The number of 
redundant events in UP2S2 is 75% lower than that of BF,and there is an increasing trend with the reduction in POT 
value. Finally, the number of redundant events tends to be stable.This indicates thatcompared with the BF 
algorithm, UP2S 2 has obvious advantages in reducing redundant eventsgenerated by flooding and save the cost of 
network resources. 

4.3 Subscription table construction time 

The purpose of this experiment is to verify that UP2S2can construct subscription tables quickly and 
efficiently. Figure 6 shows the results ofsubscription table construction timevs. POT. 




Fig.6: Subscription table c onstruction time vs. POT 



www.theijes.com 



The IJES 



Page 12 



Ontology-Based Routing for Large-Scale Unstructured P2P ... 



As shown in Figure 6,UP2S2 is better than BF in terms of subscription table construction time.Especially in the 
number of events is small, subscription table construction time of UP2S2 is nearly 30% less than that of BF. 

V. Conclusion 

In this paper, we propose UP2S2, a large-scale ontology-based Pub/Sub for Unstructured P2P networks, 
to provide the architecture of Pub/Subrouting. UP2S2is based on event and subscription subnet model. We design 
and implement the algorithms for UP2S2 to construct and reconstruct subscription routing tables, and derive 
conclusions from the simulation experiments. The results show that UP2S2 routes the events more quickly and 
accurately. 
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